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Abstract 

This paper is concerned with the following problem: Given a stochastic non-linear system controlled 
over a noisy channel, what is the largest class of channels for which there exist coding and control 
policies so that the closed loop system is stochastically stable? Stochastic stability notions considered 
are stationarity, ergodicity or asymptotic mean stationarity. We do not restrict the state space to be 
compact, for example systems considered can be driven by unbounded noise. Necessary and sufficient 
conditions are obtained for a large class of systems and channels. A generalization of Bode’s Integral 
Formula for a large class of non-linear systems and information channels is obtained. The findings 
generalize existing results for linear systems. 


1 Introduction 

Consider an A-dimensional controlled non-linear system described by the discrete-time equations 

Xt+I = f(x t ,U t ,W t ), ( 1 ) 

for a (Borel measurable) function /, with {let} being an independent and identically distributed (i.i.d) 
system noise process with wt ~ v. 

This system is connected over a noisy channel with a finite capacity to a controller, as shown in Figure[l| 
The controller has access to the information it has received through the channel. A source coder maps 
the source symbols, state values, to corresponding channel inputs. The channel inputs are transmitted 
through a channel; we assume that the channel is a finite alphabet channel with input alphabet A4 and 
output alphabet M!. 

We refer by a Coding Policy n, a sequence of functions {7 f,t > 0} which are causal such that the 
channel input at time t, q± E Ad, under II comp is generated by a function of its local information, that is, 


Qt = 7t Of), 


where X® = {iC[o,t]51?[ 0 1]} and qt £ Ad, the channel input alphabet given by M. := { 1 , 2 ,..., M}, for 
0 < t < T — 1. Here, we have the notation for t > 1 : a3[o,t_i] = Ps; 0 < s < t — 1 }. 

The channel maps qt to q' t in a stochastic fashion so that P(q[\qt, <Z[o,t-i], <?[ 0 t _]i) is a conditional prob¬ 
ability measure on JA' for all t e Z+. If this expression is equal to P(q' t \qt), the channel is said to be a 
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Figure 1: Control of a system over a noisy channel. 


memoryless channel, that is, the past variables do not affect the channel output q' t given the current channel 
input qt . Even though in this paper we will consider discrete alphabet channels, the analysis is also appli¬ 
cable to a large class of continuous alphabet channels (through an appropriate quantized approximation of 
the channel; see e.g. pH]). 

The receiver/controller, upon receiving the information from the channel, generates its decision at time 
t, also causally: An admissible causal controller policy is a sequence of functions 7 = { 7 *} such that 

7 1 : M' t+1 ->■ M m , t > 0, 

so that ut = We call such encoding and control policies, causal or admissible. 

In the networked control literature, the goal in the encoder/controller design is typically either to 
optimize the system according to some performance criterion or stabilize the system. For stabilization, 
linear systems have been studied extensively where the goal has been to identify conditions so that the 
controlled state is stochastically stable, as we review briefly later. 

This paper is concerned with necessary and sufficient conditions on information channels in a net¬ 
worked control system for which there exist coding and control policies such that the controlled system is 
stochastically stable in one or more of the following senses: (i) The state {xt} and the coding and control 
parameters lead to a stable (positive Harris recurrent) Markov chain and (ii) {. 17 } is asymptotically sta¬ 
tionary, or asymptotically mean stationary (AMS) and satisfies Birkhoff’s sample path ergodic theorem 
(see Section [A] for a review of these concepts), (iii) {xt} is ergodic. 

In the remainder of this section, we will be providing a literature review, first for non-linear systems 
and then briefly for linear systems in the context of the goals of this paper and highlight the contributions 
of the paper. Section [2] develops some supporting results and a generalization of Bode’s Integral Formula 
for non-linear systems and general information channels. Section [3] develops conditions for ergodicity and 
asymptotic mean stationarity of the controlled system. Section [4] establishes conditions for stationarity 
of the controlled system under structured (stationary) coding and control policies. Section [5] presents an 
ergodic construction for a non-linear system driven by additive Gaussian noise and controlled over discrete 
noiseless channels. Section [ 6 ] contains some concluding remarks. 

1.1 Some notation and preliminaries 

Let x be an X—valued random variable, where X is countable. The entropy of x is defined as H(x) = 
— ^2 ze xP( z ) l°g 2 (p( z )) > where p is the probability mass function (pmf) of the random variable x. If x is 
an M n —valued random variable, and the probability measure induced by x is absolutely continuous with 
respect to the Lebesgue measure, the (differential) entropy of x is defined by h(x) = — f x p(x) log 2 (p(x))dx , 
where p(-) is the probability density function (pdf) of x. 

The Mutual Information between a discrete (continuous) random variable x. and another discrete (con¬ 
tinuous) random variable y, defined on a common probability space, is defined as I{x\y ) = H(x) — H(x\y ), 
where H(x) is the entropy of x (differential entropy if x is a continuous random variable), and H{x\y) is the 
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conditional entropy of x given y ( h(x\y ) is the conditional differential entropy if x is a continuous random 
variable). For more general settings including when the random variables are continuous, discrete or a mix¬ 
ture of the two, mutual information is defined as I(x\ y) := supg q 2 I{Q i(x); Q 2 (y)), where Q i and Q 2 are 
quantizers with finitely many bins (see Chapter 5 in [16]). An important relevant result is the following. Let 
x be a random variable and Q be a quantizer applied to x. Then, H(Q(x)) = I(x; Q(x)) = h(x) — h(x\Q(x)). 
For a concise overview of relevant information theoretic concepts, we refer the reader to Chapter 5 of m- 
For a more complete coverage, see [T4] or [8]. When the realization x of a random variable xt needs to be 
explicitly mentioned, the event xt = x will be emphasized. We use the conditional probability (expecta¬ 
tion) notation P x (-) (A x [-])to denote P(-\xo = x) (E[-\x 0 = a;]). Finally, for a square matrix A, |A| denotes 
the absolute value of its determinant. 

Throughout the paper, all the random variables will be defined on a common probability space (fl, J~, P). 

1.2 Literature review 

In the literature, the study of non-linear systems have typically considered noise-free controlled systems 
controlled over discrete noiseless channels. Many of the studies on control of non-linear systems over 
communication channels have focused on constructive schemes (and not on converse theorems), primarily 
for noise-free sources and channels, see. e.g. 0 , m , and [4B]. For noise-free systems, it typically suffices 
to only consider a sufficiently small invariant neighborhood of an equilibrium point to obtain stabilizability 
conditions. 

One important problem which has not yet been addressed to our knowledge is to obtain converse (or 
impossibility) theorems: The question of when an open-loop unstable non-linear stochastic control system 
can or cannot be made ergodic or asymptotically mean stationary subject to information constraints has 
not been addressed. 

Entropy based arguments (which are crucial in obtaining fundamental bounds in information theory and 
ergodic theory) can be used to obtain converse results: The entropy, as a measure of uncertainty growth, of 
a dynamical system has two related interpretations: A topological (distribution-free / geometric) one and 
a measure-theoretic (probabilistic) one. Although the analysis in this paper is probabilistic, we provide 
a short discussion on the topological entropy: The distribution-free entropy notion (see, e.g. [23]) for a 
dynamical system taking values in a compact metric space is concerned with the time-normalized number of 
distinguishable paths/orbits by some finite e > 0 the system’s paths can take values in as the time horizon 
increases and e — > 0. With such a distribution-free setup m studied the stabilization of deterministic 
systems controlled over discrete noiseless finite capacity channels: The topological entropy gives a measure 
of the number of distinct control inputs needed to make a compact set invariant for a noise-free system. 
m extends the notion of topological entropy to controlled dynamical systems, and develops the notion 
of feedback entropy or invariance entropy [7], see also [5] for related results. m defines two notions of 
invariance for a set K. A set can be made weakly invariant if there exists t > 0, such that for every 
xo E K, there exists a sequence of control actions so that xt E K' C interior (A"). Strong invariance 
of K requires that x\ E K'. With a relaxation of deterministic controls, [ 52] has studied invariance 
entropy for random dynamical systems, and [35] has generalized the topological entropy theoretic results 
to include random dynamical models to obtain an observability condition over discrete channels. For a 
comprehensive discussion of such a geometric interpretation of entropy in controlled systems, see j24j. The 
results for deterministic systems pose questions on set stability which are not sufficient to study stochastic 
setups. Stochasticity also allows for control over general noisy channels, and thus applicable to establish 
connections with information theory (we note that a distribution-free counterpart for such studies requires 
one to investigate zero-error capacity formulations [35] , however many practical channels including erasure 
channels, have zero zero-error capacity). 

On the other hand, the measure-theoretic (also known as Kolmogorov - Sinai or metric) entropy is 
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more relevant to information-theoretic as well as random noise-driven stochastic contexts since in this 
case, one considers the typical distinguishable paths/orbits of a dynamical system and not all of the sample 
paths a dynamical system may take (and hence the topological entropy typically provides upper bounds on 
the measure-theoretic entropy). Measure-theoretic entropy is crucial in the celebrated Shannon-McMillan- 
Breiman theorem m as well as the isomorphism theorem [35] [23]. For further relations between different 
interpretations of entropy as well as their computations (such as through Lyapunov exponents as a result 
of Pesin’s formula), we refer the reader to [S2] . Such an entropy notion has operational practical usage 
in identifying fundamental limits on source and channel coding for stationary sources m ■ However, the 
findings in the information theory literature has not yet been successfully applied to non-linear networked 
control systems in general due to the following reasons: (i) The open-loop system in networked control may 
be unstable and stabilizable only through a control loop. In the information theory literature, stochastic 
stability results for coding schemes have been established primarily for (control-free) stable sources and 
when non-stationary, have involved only linear Gaussian auto-regressive (AR) processes [18|. Moreover, 
such a control-free analysis does not lead to conclusive results for non-linear controlled sources since non¬ 
linear systems suffer from the dual-effect: one cannot decouple estimation from control, and control from 
conditional entropy properties under a stationary probability measure, (ii) The coding schemes for such 
studies in information theory are non-causal; in networked control systems, coding must be causal (that 
is, real-time or essentially zero-delay E3). 

There have been few studies which have adopted a measure-theoretic entropic view for the control of 
non-linear dynamical systems over communication channels. Relevant contributions include }36| and m : 
Building on m and |69|; [36J develops an entropy analysis for non-linear system dynamics to obtain the 
relation between the entropy rates of a measurement disturbance, output and the dynamical system, and 
generalizing a Bode-type entropy analysis for non-linear systems. A related entropy analysis for a class of 
stochastic non-linear systems have been considered in [63] . Recently [59] and |58j have considered fading 
and erasure channels between the controller and the actuator and have studied ergodicity properties using 
Lyapunov theoretic arguments under a class of structures imposed on control policies; these contributions 
do not consider finite-rate information and coding restrictions which may arise due to the presence of a 
channel. Other important relevant work which consider deterministic systems are [29] and |28], where 
stability of zooming schemes, as in [3], have been considered. 

Finally, we note an important related discussion in view of Bode’s integral formula as extended to a 
class of non-linear systems in [65] under somewhat restrictive conditions, see ESI Thm. 9]. Relevant work 
includes fl2j . m and |43] for linear systems. For non-linear systems the entropy and mutual information 
arguments provide the appropriate fundamental bounds instead of a sensitivity integral/transfer function 
analysis which is commonly used for linear systems as is also advocated in [69J. An earlier contribution 
utilizing measure theoretic entropy for the study and classification of controlled stochastic systems is 
The findings in our paper provide further generalizations; see Theorem 2.2 and Remark [4] 

The stability criteria outlined earlier have been studied extensively for linear systems of the form 


x t+ \ = Ax t + Bu t + Gw tl (2) 

where xt G R^ is the state at time t, ut £ R m is the control input, and {w;*} is a sequence of i.i.d. Revalued 
random vectors (such as Gaussian). Here, ( A,B ) and (A,G) are controllable pairs. 

For noise-free linear systems controlled over discrete-noiseless channels, Wong and Brockett [E[l], Bail- 
lieul p]; and more generally, Tatikonda and Mitter [56] (see also [55] ) and Nair and Evans m have obtained 
the minimum lower bound needed for stabilization over a class communication channels under various as¬ 
sumptions on the system noise and channels; sometimes referred to as a data-rate theorem. This theorem 
states that for stabilizability under information constraints, in the mean-square sense, a minimum average 
rate per time stage needed for stabilizability has to be at least ^r|A,|>i ^°§ 2 (l^*l)> where {Aj, 1 < i < N } 
are the eigenvalues of A. 


4 



The particular notion of stochastic stability is crucial in characterizing the conditions on the channels 
and important extensions have been made in the literature notably by Matveev and Savkin [53] [35] . Sahai 
and Mitter [35] [39], and Martins et al. [32]. For a more comprehensive review; see [42] . Chapters 5-8 of 
m, isi], and m- Reference [39] considered erasure channels and obtained necessary and sufficient time- 
varying rate conditions for control over such channels. Reference [9| considered second moment stability 
over a class of Markov channels with feedback. Motivated from such problems, [ 63 ] and [68j developed a 
martingale-method for establishing stochastic stability, which later led to a random-time state-dependent 
drift criterion, leading to the existence of an invariant distribution possibly with moment constraints; these 
were utilized to obtain policies leading to strong forms of stochastic stability, such as ergodicity or positive 
Harris recurrence m, for linear systems driven by additive unbounded noise. 

The following definition (see m Definition 8.5.1]) will be useful in the analysis later in the paper. 

Definition 1.1 Channels are said to he of Class A type, if 

• they satisfy the following Markov chain condition: 

q[ ^ qt,q[o,t-i],q[o,t-i] ° > o}, (3) 

that is, almost surely, for all Borel sets B, 


P{q't e B\qt,q[ 0 ,t-i},q[o,t-ip x o,Ws,s> o) = P(q[ e B\qt,q[o,t-i],q[ 0 ,t-i])^ 

for all t > 0, and 

• their capacity with feedback is given by: 


° = t'IS, ,p, 


where the directed mutual information is defined by 


T—l 

i{q[o,T—i\ —t 9 (o,t-i]) = X / 1 q'tW[o,t-i]) + I (qc, %)■ 

t =i 

Memoryless channels belong to this class; for such channels, feedback does not increase the capacity 
[8|. Such a class also includes finite state stationary Markov channels which are indecomposable |35|, and 
non-Markov channels which satisfy certain symmetry properties m- Further examples can be found in 
[57] and in mi- 

Theorem 1.1 fF7] / f63$ Consider the multi-dimensional linear system Q). For such a system controlled 
over a Class A type noisy channel with feedback, if the channel capacity satisfies 

C< £ iog 2 (N), 

N>i 

(i) there does not exist a stabilizing coding and control scheme with the property lim inFr-^ ^h(xT) < 0, 

(ii) the system cannot be made AMS or ergodic (see Section g. 

For sufficiency, assume that A is a diagonalizable matrix (a sufficient condition for which is that its 
eigenvalues are distinct real). 


5 


Theorem 1.2 }g7} / fE5] Consider the multi-dimensional linear system |I|) with a diagonalizable matrix A 
and Gaussian noise, controlled over a discrete memoryless channel. If the Shannon capacity of the channel 
satisfies 

C> Y, lo g 2 (|Ai|), 

] Ai | >1 

there exists a stabilizing scheme which makes the process {xt} AMS. If the channel is noiseless, or a 
memoryless erasure channel, or is a Gaussian channel, then the process {xj} can be made stationary and 
ergodic. 

1.3 Contributions of the paper 

As stated above, stochastic stabilization of non-linear systems driven by noise (especially unbounded noise) 
over communication channels has not been studied to our knowledge where the goal is to establish asymp¬ 
totic (mean) stationarity, ergodicity or stationarity of the closed-loop system. We use measure-theoretic 
entropy analysis and ergodic theoretic tools for arrive at necessary and sufficient conditions. A by-product 
of the analysis is a generalization of Bode’s Integral Formula to a class of non-linear systems and arbitrary 
information channels with memory. The approach in the paper, although building on our earlier work on 
linear systems, contains significant generalizations in the approach due to the non-linearity of the source. 
We also consider a construction of a stabilizing coding and control scheme for multi-dimensional non-linear 
sources driven by unbounded noise controlled over a discrete noiseless channel. 


2 Sublinear entropy growth and a generalization of Bode’s Integral 
Formula for non-linear systems 

In the paper, instead of a general M^-valued non-linear state model 


%n-\-l — /(^n? ^n) 5 

(4) 

we will consider non-linear systems of the form 

x n+ i = f(x n ,w n ) + Bu n , 

(5) 

Xn+l = f(x n ) + Bu n + W n , 

(6) 

Xn+1 = f{x n ,U n )+W n - 

(7) 

We also will have an occasion discuss non-linear systems of the form 

X n -\-l — -^{x n ^U n . 

(8) 


In all of the models above, x n is the M^-valued state, w n is the M^-valued noise variable, u n is 
valued and w n assumed to be an independent noise process with w n ~ v. 

We assume throughout that / is measurable and continuously differentiable in the state variable. For 
a possibly non-linear differentiable function / : M. N —> R m , the Jacobian matrix of / is an n x m matrix 
function consisting of partial derivatives of / such that 

J(f)(i,j) = ° < 'g l<i<m, l<j<n. 

We will have the following assumption throughout the paper. 
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Assumption 2.1 In the models considered above f(-,w) : M. N —> is invertible for every realization of 

w. 


In the following | J(/)| will denote the absolute value of the determinant of the Jacobian. Furthermore, 
with f w (x ) = f{x,w), we define J(f(x,w )) := J{f w {x)). 

Assumption 2.2 There exist Mi e M and ti 6 1 so that for all x,w 

L\ < log 2 (| J{f{x,w))\) < Mi 

The following is our first result; it provides conditions for sublinear entropy growth (in time) which 
implies quadratic stability. The result will also be used in the next section and its proof leads to a 
generalization of Bode’s Integral Formula as discussed further below. Let nt(B) = P{xt & B) for all Borel 
B. 


Theorem 2.1 Consider the networked control problem over a Class A channel, (i) Let f have the form 
in 0. (%%) Assumptions \2.1\ and \2.I\ hold, and (Hi) xq have finite differential entropy, a) If there is an 
admissible coding and control policy such that 


lim inf h{xt)/t < 0, 

t—¥ OO 


it must be that 


C>hmjnf [ Mdx)( f v(dw)\og 2 (\J(f{x,w))\) 


b) If there is an admissible coding and control policy such that 


lim sup h(x t )ft < 0, 

t—t OO 


it must be that 


T—l 

C> lim sup M dx )(^J v{dw) log 2 (| J(/(x, w))\) 


(9) 


( 10 ) 


In either case, if L := inf TjU) log 2 \J(f(x,w))\, then C > L. 


Remark 1 The condition lim sup^^ h{xt)/t < 0 is a weak condition. For example a stochastic process 
whose second moment grows subexponentially in time so that lim supj^^ 1 ° s ^ x t D < o, satisfies this 
condition. Hence, quadratic stability implies this condition. o 


Remark 2 In the theorem, we would have obtained the same results if we had replaced lim sup^^ h(xt)/t < 
0 with lim sup^^ jh(xt\q'^ 0 < 0. This condition would be more relevant for state estimation prob¬ 

lems, where the goal is not necessarily to make the state stable, but to make the estimation error stable 
(where ut would be the state estimate and xt~ut would be the estimation error). Since hfxt\q'^ 0 t _ 1 i) < h{xt), 
it is evident that the condition h(xt)/t < 0 implies that h(xt\q', 0 t _u)A < 0. o 
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Proof of Theorem 2.1 Recall that for channels of the type Class A (which includes the discrete 


memoryless channels (DMC) as a special case), the capacity is given by: 

C = p , . max , u ^(?[o I r-i]->9jo,T-i]) 


where 


T—l 


I(Q[o,t— i] ?(o,T-i]) — Y, / (9[0 1 t];9tl9[0,t-i]) + I( x o'>Qo)- 


t =l 


Let us define = max {P(? t | 9 [o it -i], 9 ; 0!i _ 1] ),o<KT-i} J Tn=o HQu 9[0,t] k( 0 ,t-i])- Observe that for t > 0: 

^(9t;9[o,t]|9[o,t-i]) = fl '(9tl9[o,t-i]) “ fl ’(9tl9[o 1 t].9[o,t-i]) 

= H WtW[ 0 ,t-l]) - 

> H WtW[o,t-i]) - H (Qt\ x t,q[o,t-i]) 

= i(xf,<£W[o, t -i])- ( 12 ) 

Here, @ follows from the assumption that the channel is of Class A type. 


( 11 ) 


T—l 


a) Consider the following 

Jim Rt > lim sup ^ ( V I(x t ] «tl9fo,t-il)) + ^o; Qo) ) (I 3 ) 

T^roo T—>00 1 V 1 J / 

= limsup ^ Y (^ h ( x t\q[ 0 ,t-i]) - + I{xq\ q'o) 

1 T_1 ( N 

= lim sup - ^ I /ifo^t-i]) - /»(*t|9[o,t]) 

T—>-oo J t=1 V / 

1 T_1 /" / ' 

= lim sup - ^ ( h(f(xt-i,wt-i) + |< 3 '[ 0 ^_ 1] ) - h(x t \q[ ot] ) 

T ^°° t=i ' 

! T_1 / 

= lim sup - ^ ( /r(/(.x t _i,u; t _i)|g[ 0)t _ 1] ) - /i(®t|gj 0it] ) 

T ^°° t=i ' 

1 T_1 ( 

= lim sup - ^ ^ ^/(®t-i>«'t-i)l9[ 0 ,t- 1 ] = C[o,t-i]= C[o,t-i]) 

T—¥ OO 1 - n \ * 

1 C[o,t-i] 

-M*i|gj 0l t])) ( 1 4 ) 

> lim sup - Y( Y Kf{ x t-i,w t - l )\q' [ot _ l] = C[o,t-i], ™t-i)P(q[o,t-i} = C[o,t-i]) ) 

T—¥ OO 1 j. _1 \ > / 

1 C[o,t-i] 

= lim sup^ ]T ( ( Y / ^(/(*t-i,w)|g|o )t _i] = C[o,t-i]>«^-i = w)v{dw) 

T—>00 1 , W A J 

1 Cro.t-n 


(15) 



(16) 


Here, 


x -^*( ( 7[o,t-i] — C[o,t—i]) ) h(xt\q[ 0) t]) 


T—l 


= limsup^^f P(Q[o,t-i]=C[o,t-i]) 

T—¥OC 1 - \ 


t —i C[o,t-i 


x (/ v{dw) ^ J P(dx t _i|g[ ot _ 1] = C[o,t-i],™t-i = w)log 2 {\J{f(x t -i,w))\) 
+h{x t -i\q [0>t _ 1] = C[o,t-i],^t-l = «>))) - h( x tW[o,t})) 


T -1 


= U ™_^P^5Z( 2 P («[0,t-1] = C[0,t-1]) 


*—1 C[0,i-1 


u(dw) ^ J P(dx t -i\q[ ot _ 1] = C[ 0i t_i])log 2 (|J(/(x t _i,u;))|) 
+/i(x t _i|9j 0>t _i] = C[o,t-i]M - ^(*t|9[o,t]) 
lim sup^Wf ^ P(g[ 0 ,t-i] = C[o,i—l]) / P(d®t-i|?j 0i t_ 1] = C[o,t—l]) 

T—)-CJO f _i \ \ a J 

X J iy(dw)\og 2 (\J(f(x t - 1 ,w))\)^ +h(x t -i\q[ o t _^) - h(x t \q[ o t] ) 
lim sup — (±f n(dx)^J u(dw)log 2 (\J(f(x,w))\) S j -h(x T -i\q[ 0i 


l,T—1]. 


t =0 

1 


> Vr_li ^ > “ f f h ( X T~lW[ 0 ,T-l]) 


T—l 

V [ M dx )( [ v(dw)log 2 (\J(f(x,w))\) 


(17) 


(18) 


(19) 


( 20 ) 


( 21 ) 


Equations (14) and (16) follow from the definition of conditional entropy, (15) follows from conditioning 
on the random variable wt-\ ■ Equations (IT)-(18) follow from the fact that xt-i ■H- (/[ot-i] ^ w t -1 is a 
Markov chain and the following. For every realization Q , | 0i _ 1 ] = C[o,t—1]> 

/K/(zt-i,wt-i)k[ 0it _i] = C[o,t-i],™t-i = w ) 

= h(f w (x t - i)|<?[ 0 ,t-i] = C[o,t-i].Wi-i = w ) 

= J P(dx t -i\q' [0it _ 1] = C[o,t-i],w t -i = w) log 2 (\J(f w (x t -i))\) 

+h(x t -i\q' [ot _ 1] = C[ 0 ,t-i],wt-i = w) 

( 22 ) 

= J P(dx t -i\q'^ o t _ij = C[o,t-i])^g 2 (\J{f{x t -i,w))\) + /i(x t -i|gj 0)t _ 1] = C[o,t-i]), 

where f w {x) := f(x, w ) is an invertible function for every re, and as a result ( |22[ ) follows from the entropy 
formula for invertible functions of a random variables (see, e.g., p. 167 of [53J and Lemma 4 in |69j ) and the 


9 








last line follows from the condition xt -1 -H- q^ ot _^ -H- wt~ i- Equation (19) follows from Fubini’s theorem 


by Assumption 2.2 


By the hypothesis, lim inf^oo \h(xt) < 0, it must be that lim^oo Rt > V. Thus, the capacity also 
needs to satisfy this bound. 


In the above derivation, (20) follows from the fact that for two sequences a n ,b n : 


lim sup(a n + b n ) > lim sup a n + lim inf b n . 

n. — n —Vrvi ^ 


(23) 


b) If lim sup^^ h(xt\q'[ 01 _^)/1 <0, (20) can be applied through (23) with V defined as 


lim sup —E 

T—¥ OO T 


T—l 

j p ( dx tW[o,t-i])(^J v(dw)log 2 (\J(f(x t ,w))\) 


and in (20), lim inf-p^oo ^h(xT-i\q't 0 t~i}) being replaced with lim sup of the same expression. o 

Remark 3 We note that if the system had been of a model in the expression involving J(f(x,w)) 
would explicitly depend on the control policy which woidd in turn depend possibly on the entire past channel 
outputs making the expression computationally more involved. o 


2.1 A generalization of Bode’s Integral Formula for non-linear systems 


The proof of Theorem |2.1| reveals an interesting connection with and generalization of Bode’s Integral 
Formula (and what is known as the waterbed effect ) [38] to non-linear systems, which we state formally in 
the following. The result also suggests that an appropriate generalization for non-linear systems is through 
an information theoretic approach that recovers Bode’s original result for the linear case as we discuss 
further below. 


Theorem 2.2 (i) Let f have the form in &. (ii) Assumption \2. 1\ hold, and (Hi) xo have finite differential 
entropy. If there is an admissible coding and control policy with lim sup^^ h{xf)/t < 0 it must be that 


lim sup —I(q[o.T-\] -> 9[ 0 ,t-i]) 


T—¥ oo 


T—l 


T—>oo 


t =0 


> lim sup — E 7T t (cfe)( / u(diu) log 2 (| J(/(x, ir))|) 


(24) 


Proof. This follows directly from equations ([TT|) , (|13|) and (20). 


Remark 4 [Reduction to Bode’s Integral Formula for Linear Systems and Gaussian Noise] 

If the system considered is linear with all open-loop eigenvalues unstable, the channel is an additive noise 
channel so that q' t = qt + vt for some stationary Gaussian noise, and time-invariant control policies are 
considered leading to a stable system, then with the more common notation of yt = q[, the right hand side 


of (24) would be the sum of the unstable eigenvalues of the linear system matrix. For a stationary Gaussian 
process [see m, page 274] the entropy rate can be written as 

^log(27re)+ f Jlog (S(f))df 
2 4-1/2 2 

with S denoting the spectral density of the process. Now, 0 becomes 

HqL Q[ 0 ,t]W[ 0:t -i]) = h(q' t \q [ 0 ,*_i]) “ 9[o,t-i]) = h WtW[o,t-i]) ~ h (vt\v [0 ,t-i]), 
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and thus the left hand side of ( 24 ) reduces to the difference between the entropy rate of the process q' t (that 
is, lim^oo h(q' t \q\ 0 j_i\)) and that of the stationary noise process vt (that is, lim^oo h(vt\v[ 0jt _i]). Then, 
the left hand side of \2f\ ) equals 


1/2 


'S v (fY 


which then is equal to the integral of the log-sensitivity function (corresponding to the transfer function 
from the disturbance process vt to the output process q' t ). This leads to the celebrated Bode’s Integral 
Formula. In the context of linear systems, earlier extensions of this formula have been studied in m with 
an information theoretic interpretation under the restriction to linear policies (see e.g. Theorem f.6 in 
\12j). in \3iy under more general possibly non-linear stabilizing control policies which lead to a stationary 
process, and in m and m for a class of non-linear noise-free systems. o 


3 Asymptotic mean stationarity and ergodicity 

In the following, we build on, but significantly modify the approaches in [33] and m to account for 
non-linearity of the system. 

Consider the system ([6]), under some admissible policy, controlled over a channel. 

Assumption 3.1 We assume 


M := sup log 2 \J(f(x))\ < oo, 

x£R n 

L := inf „ lo g2 > “OO. 

x£R n 


Theorem 3.1 Consider the system \ 
h(x o) < oo and Assumptions 2.1 and 


7\j controlled over a Class A type noisy channel with feedback where 


3.1 hold. If C < L, then under any admissible policy, 

L-C 


lim sup-P(|xt| < b(T)) < 1 — 

T—>• oo 


M 


for all b(T ) > 0 such that lim'r_ ) . 00 ^ log 2 (6(T)) = 0. 

The proof is in Section [B] of the Appendix. An implication of this result follows. 

Theorem 3.2 Consider the system Iq) controlled over a Class A type noisy channel with feedback where 
h(x o) < oo and Assumptions 2.1 and \3.1\ hold. If under some causal encoding and controller policy the 
state process is AMS, the channel capacity C must satisfy C > L. 

We recover the following result for linear systems in m as a special case. 

Corollary 3.1 For the linear case with f(x) = Ax with eigenvalues |A*| > 1, C > log 2 (|Aj|) is a 
necessary condition for the AMS property under any admissible coding and control policy. 

Proof of Theorem 3.2 If the process is AMS (see Section 0. then there exists a stationary measure 
P such that 


1 


N 


fc=l 


(25) 


for all (cylinder) events D. Let for bs G M+, B E ^(M^) be given by B = {x : |x| < bs} and X n (z ) = z n 
be the coordinate function (see Section 0 where z = {zq, z\, Z2, ■ ■ ■ }■ 
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If by Theorem 3.1 


lim sup P(\xt\ < b B ) < 1 - ^ C ^ < 1 , 

T-s-oo M 


( 26 ) 


, then P n (B) < 1 — for all compact B, where P n is the marginal probability on 


holds for all bs E 
the nth coordinate defined as 

P n (B) = P\x:\X n (x)\<b B 

But then P n , as an individual probability measure, must be tight [3], therefore, for every 5 > 0 there 
exists bs < oo such that P n (B) >1 — 5. But, by (25), this would imply that lim sup t ^ f00 P{T~ t B) = 
lim sup^oo P(\xt\ E B) > 1 — 5, leading to a contradiction with (26) for 5 < Hence, the AMS 

property cannot be achieved. o 

We end this section with a remark. 


Remark 5 In information theory, a well-established result is that for noiseless coding of information stable 
sources (this includes all finite state stationary and ergodic sources) over a class of information stable noisy 
channels (which includes the channels we consider here), an asymptotically noise-free recovery is possible 
if the channel capacity is greater than the source entropy through the use of non-causal codes, see e.g. fUUf 
f25\j . However, for the problem we consider (i) the source is non-stationary and open-loop unstable, (ii) 
the encoding is causal, and (Hi) the source process space is not finite-alphabet. Nonetheless, we see that the 
invariance properties of the source process does appear in the rate bounds that we obtain. o 


4 Stationarity and positive Harris recurrence under structured (sta¬ 
tionary) policies 

In many applications, one uses a state-space formulation for coding and control policies. In the following, 
we will consider stationary update rules which have the form that 


q t = 7 e (x t ,m t ) 
u t = 7 d (m t ,q[), 

m t = r}{mt-i,q' t _ 1 ), (27) 

for functions 7 e , j d , and r/. In the form above, m is a S- valued memory or quantizer state variable. A large 
class of adaptive encoding policies have this form. This includes, delta modulation, differential pulse coded 
modulation (DPCM), adaptive differential pulse coded modulation (ADPCM), Goodman-Gersho type 
adaptive quantizers (see e.g. [26] E3), as well as the coding schemes used for stabilization of networked 
control systems under fixed-rate codes [53]. Even further, jointly optimal source and channel codes for 
zero-delay coding schemes under infinite horizon optimization criteria also have the form above (where § 
is a space of probability measures [3Q|). We now present a necessary structural result on the encoders. 


4.1 A necessary structural result on the encoders 

Let mt , the state of the encoder, take values in S. Consider ©>■ A stabilizing time-invariant en¬ 
coder/decoder/controller policy given (27), in general, cannot have |S| < oo. 
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Theorem 4.1 Consider |6]) with scalar xt and wt with a probability measure v such that it has a density 
positive everywhere and E u ['y~ w ] < oo for some 7 > 1. Suppose that there exists K > 0 so that 


inf > 1 . 

x>K dx 


and %.{x) is bounded, 
that 


Then, a finite cardinality for 8 , under (21) leads to a transient system in the sense 


P x {t s < 00 ) < 1 


where for some s > 0, S = (— 00 , s) is an open set containing the origin, x > s and t$ := inf(t > 0 : 37 E S'). 
A similar result applies for the condition 


df, ^ . 

sup —(x) < - 1 , 
x<-I< dx 


with S = ( s , 00 ) for some s < 0 and x < s. 

Proof. Let inf X >K %.{x) > a > 1. It follows from f(x) = f{K) + fx^:(s)ds that for some 
M < 00 , f(x) > M + ax for x > K. Since both q’ t and mt can take finitely many values, there ex¬ 
ists U such that \ut\ < U for all t. Let with 7 > 1, a Lyapunov function be picked as V(x) = r f~ x , 
defined for positive x. Now, it follows that for sufficiently large x: E[V(xt+i)\xt = x] < V(x), since 
E['y~(f( x )+ Ut+Wt '>] = E['y-f( x ')j- Ut j- Wt ] < E['y-( M ~ u )'y-(°' x + w t)] = ^-l M ~U)^-ax f Qr x g : 

7 ( 0 -i)x ^ E[y~ w ] r Y~ M+u >}. Due to the additive noise process the source can escape any bounded inter¬ 
val with a non-zero probability. As a result, by Theorem 6.2.8 in m (see also Theorem 8.4.1 in EH), 
transience follows. o 

Transience prohibits the existence of a stationary probability measure. The discussion above is parallel 
to Theorem 7.3.1 in m for linear systems. Related to the discussion above, for linear systems, the 
unboundedness of second moments in Proposition 5.1 in TOj and the transience of such a controlled state 
process was established in Theorem 4.2 in [ 66 ]. We also note that m studied conditions for stabilization 
when the control actions are uniformly bounded, the controlled multi-dimensional system is marginally 
stable and is driven by noise with unbounded support. 


4.2 Stationarity and Ergodicity 

In this section, instead of asymptotic mean stationarity, we will consider the more stringent condition of 
(asymptotic) stationarity of the controlled source process. For ease in presentation we will assume that mt 
takes values in a countable set, even though the extension to more general spaces is possible. 

Lemma 4.1 If the channel is memoryless, the process ( xt,mt ) is a Markov chain. 


Proof. For any t € N, 


P{dx t , m t \x s , m s , s < t — 1) 

= ^ ~^P{dx t ,m tl q t _i\x s ,m s ,s < t - 1) 

= ^2 P(dxt\xt-i,'y d (mt-i,q' t _ 1 ))P(q , t _ 1 \'y e ( x t-i,mt-i))P{mt\q , t _ 1 ,mt-i) 

= ^^ P(dx t ,m t , q' t _ 1 \x t -i,m t -i ) = P(dx t ,mt\x t -i,mt-i) (28) 


where we use the fact that the channel is of class A and © and ( |27| ) . o 

In the following, we assume that the channel is memoryless. For the Markov chain ( 37 , mt), let 1 Tt.(B) = 
P(xt E B) for all Borel B , that is, ir t is the marginal occupation probability for the state process 37 . 
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Theorem 4.2 Suppose that the encoding, control and the memory update laws are given by (21). (i) Let f 


have the form 0. (ii) Assumptions \ 2. 1\ and \2.2\ hold, (Hi) h(x o) < oo. For the positive Harris recurrence 
of the process xt,mt (which implies the existence of a unique invariant measure it (and thus ergodicity)), 
it must be that 


C> TT{dx)[ / v(dw)\og 2 (\J(f{x,w))\)), 


(29) 


provided that lim sup^^ jh(xt) < 0 . 


»oo t 

Proof. First note that 


C > I(qt,q[) = H{q[) - H(q[\q t ) 

> H(q' t \m t ) - H(q t \q t ) = H(q' t \m t ) - H(q[\q t ,x t ,mt) 

> H(q't\m t ) - H(q' t \x t ,m t ) = I{(f t \x t \m t ) 


Hence, 


T -1 


C > linmnf - ^ I(q' t ; x t \m t ) 
t =o 

1 T_1 ( N 

= lim inf — V] ( h(x t \m t ) - h(x t \m t , q [) 

T-»oo J z ' \ 

t=0 v ' 

= lim inf (f?( h(^f{x t -i,w t -i) + Bu t -i\m^j - h(xt\m t ,q' t )^ + I(q' Q -, x 0 |m 0 )^ 

1 T_1 / \ 

= lim inf — V] hi f(x t -i,w t -i) + Bu t -i)\m t - h(x t \m t ,q ' t ) 

T frf V / 

1 T_1 / \ 

> lim inf — V' hi f{x t -i,w t -i) + Bu t -i\m t , m t -\, q t _i - h(x t \m t ,q t ) 

T \ ) 

1 T_1 / \ 

= lim inf — Y"' hi f(x t -i,w t -i) + Bu t -i\m t -i, q[_ x - h(x t \m t ,q ' t ) 

T V ) 

1 T_1 / \ 

> lim inf — Y''hi f(x t -i,w t -i) + Bu t -i\w t -\, m t -i, q t _ x ) - h(x t \m t , q't) 

T ^°° T fTf \ J 

^°° t=i ' ' 

= U ™ inf ( X] 


t=l 


t-i = "h ?t_i = q 


x j P(x t -i € dx\m t -i = m,q! t _ 1 = q',w t -i = w)log 2 (\J(f(x,wt-i))\) 
+h(x t -i\m t -i = m,c[ t _ x = q,w t -1 = 


T-l 


t =l 


iirr^inf u( ' dw ^ ( «t-1 = ?') 
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X / jP * Xt ~ 1 e dx \ mt - 1 = m idt -1 = ^') log2(l*^(/( ic ? ^t-l))l) 
=m,q[_ 1 = q') \ - h(x t \m t , q't^j 

rj 1 _ 1 

= (/ v(dw t -i)(^J TT t -i{dx)log 2 {\J(f{x,w t - 1 ))\) 

+h(x t -i\m t -i,q , t _ 1 ) S j - 

= l™>f 7 1 ( X] f n t-i{dx)^j v{dw) log 2 (| J(/(.x, w))\) 

-h(xT-i\m T -i,q' T -i)^ 

T—l 

ir t~i( dx )[J Kdw)log 2 (|J(/(x,u;))|)^ -h{x T -i) 

> lim inf — E / 7r t _i(da:)f f u(dw)\og 2 (\J(f(x,w))\) J -lim sup ^/z(.x T -i) 

^^ «/ \ J J T ^oo 

jp _^ 

J *t-i(dx)^J v(dw)log 2 (\J(f(x,w))\) 

T—l 

l pJ2 f Kt-i(dx)^J v(dw)log 2 {\J(f(x,w))\) 

T—l 

fJ^J v(dw) log 2 (\J (f (x, w))\) 

= J TTo(dx) ^ J ir(dz) ^ J v(dw) log 2 (\J(f(z,w))\) 

= J ir(dx)(^J v(dw)log 2 (\J(f(x,w))\) 


= lim inf / TTo(dx)E, 


> TTo(dx) lim inf E. 


( 30 ) 


(31) 


(32) 


(33) 

(34) 

(35) 

(36) 

(37) 

In the first lines above, we use the fact that conditioning on a random variable reduces the entropy and the 


update laws (27). The equality (31) holds since for every w, the map f(.,w) is invertible (here J{f{x,wt)) 


is the Jacobian for the realized value of wt) and that wt is an independent noise process using the laws of 


total probability. Here (30) follows due to the independence of wt, (32) follows from Fubini’s Theorem since 


log 2 (| J(f(x, u>))|) is bounded, (35 ) follows from Fatou’s lemma given the assumption that log 2 (| J(f(x, w))|) 
is bounded from below, and (36) follows from positive Harris recurrence (see can Theorem 4.3.1]). o 


Remark 6 If one considers a more general control-affine model such as of the form 0 with xt -|-i = 
f(xt,wt) + B(ut)xt, the condition would read as: 


C> J n(dx,m,q')^J v(dw) log 2 


J ( f(x,w) + B{i d {m,<j))x 


where it is invariant for the (enlarged) Markov chain ( xt,mt,q [). 
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5 Discrete Noiseless Channels and a Stationary and Ergodic Construc¬ 
tion 


In this section, we provide achievability results and a stabilizing coding/control policy. As discussed earlier, 
the study of non-linear systems have typically considered noise-free controlled systems; e.g. h, m, and 
i lb . As also noted earlier, for noise-free systems, it typically suffices to only consider a sufficiently small 
invariant neighborhood of an equilibrium point to obtain stabilizability conditions which is not necessarily 
the case when the system is driven by an additive noise process. We consider such an example in the 
following. 

Theorem 5.1 Consider a non-linear system of the form where {ret} is a sequence of zero-mean 
Gaussian random vectors and there exists a control function k(z) such that \f(x, k(z ))|oo < |a||rc — z\oc for 
all i,z£ UN with At(0) = 0. For the stationarity and ergodicity of {xt} (and thus with a unique invariant 
probability measure), it suffices that C > -/Vlog 2 (|a|) + 1. 


Remark 7 It may be possible in general to reduce the rate requirements by the use of variable-rate encoding 
schemes; for example, if there exists a compact region outside of which the constant a can be upper bounded 
by a smaller number, a region-dependent quantization rate can be applied which can reduce the average data 
rate required for system stability. In this paper, since there is an explicit channel, our focus has been on 
fixed-rate coding schemes. o 


Proof. The proof follows essentially from the approach developed in [63] and m with extension to 
non-linear analysis. Consider the case with N = 2. Let A > 0 denote the bin size for a uniform quantizer 
and let for each coordinate x l E M, i = 1, 2; 

{k -\{K + 1))A, if x l e [(k - 1 - \K) A, (k - \K) A) 

(\{K- 1))A, if x l = } 2 KA (38) 

0, if x i \-\KA, \KA}. 

and define Q A (x) = (Q^(x 1 ), Q^(x 2 )) if Q^{x l ) / 0 for i = 1,2 and Q A (x) = 0 if Qj((x l ) = 0 for some 
i. Thus, the number of symbols in the image of Q A is I\ 2 + 1 (and not (K + l) 2 ). The quantizer outputs 
are transmitted through a memoryless erasure channel, after being subjected to a bijective mapping, 
which is performed by the channel encoder. The channel encoder maps the quantizer output symbols to 
corresponding channel inputs q £ A4 := {1,2..., K 2 + 1}. A channel encoder at time t, denoted here by 
£t , maps the quantizer outputs to M. such that £t{Qt(xt)) = qt £ AT. For i = 1,2, let R' = log 2 (A'). For 
t > 0 and with Aq = Ag E M, define 

/?* = Xt 
H A^'- 1 ’ 

and with 


consider: 


Qk(x 1 ) = < 


u t = -K{x t ), 

A1 


x i 

I 2 

x t 


Qk\ ( x l) 


L Qf< 2 ( x t) 

1 


Ijmaxi |/ii|<l} ~b 


{maxj |/i;|>l}! 


A ( +1 = AiQ(\hl\,\hi\,Ai,Ai), A 2 t+ 1 =AfQ(\hf\,\hi\,Al,A 2 t ), 


(39) 

(40) 
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with, for i= 1,2, 5>0aE (0,1), L > 0 such that 

Q(x, y, A 1 , A 2 ) = |a|+<5 if \x\ > 1, or \y\ > 1 

Q(x,y, A 1 , A 2 ) = a if |a;| < 1, \y\ < 1; A 1 >L,A 2 >L 

Q(x,y, A 1 , A 2 ) = 1 if \x\ < 1, |y| < 1; A 1 < L or A 2 < L 

Note that, the above imply A\ > aL. To make the state space for the bin size process countable as in m 
[64] , we take that log 2 (Q(-)) take values in integer multiples of s where the integers taken are relatively 
prime (that is they share no common divisors except for 1); see [B71 , Lemma 7.6.2]. 

We note the following without proof. 

Lemma 5.1 The process (xt, A t) is a Markov chain. 

We define a sequence of stopping times as follows: 

7o=0, T z +i = inf{&: >T Z :\h\\ < 1, i G {1, 2 }}, z£Z + . 

By the strong Markov property and the nature of the stopping times, (x%, h-j- z ) is also Markov. In the 
following, we show that there exist bo > 0 , b\ < oo such that 

E [l°g(Aj - z+1 )| A Tz , h Tz ] < log(A 2 Tz ) -bo + &i1{|a T 2 |<f} (41) 

We first bound the probability P{T z +i — T z > A:|A^, hj-f) from above. 

Lemma 5.2 The discrete probability measure P(T z +i — T z = k \ xj- z ,Aj- z ) has the upper bound 

P(T Z+ 1 ~T Z > k\x Tz ,A Tz ) < M(A Tz )r~ k , 


for some r > 1 and liniA-s-oo M(A) = 0. 

Proof. Observe that for 0 < k < t\, Xk = 0 and Uk = k( 0) = 0. Let |x| = HxHoq. Now, for k >2 


P(Ti > k\xo, A 0 ) < Pxq.Aq ( \xk- 1 | > (|a| + 5) k 2 2 R ' 2 aA 0 


< PxoAo y\f(%k-2)\ + \wk-2\ > (|a| + 5) 

< PxoAo ^|a(®fc-2)| + \wk-2\ > (|a| + 5) 

< PxoAo ^l a lkfc-2| + \wk-2\ > (M + 5) 


k ~ 2 2 R '~ l aAo 


k - 2 2 R '~ l aAo 


fc-22^'-i 


aA 0 


• k—2 

<PxoA ' 


|a| l \wi\ > 


(|a|+<5) fc 2 2 r 1 aAo . 

ifczi-Fo - x 0 \ 


— PxoAo 


i =0 
k—2 


yy |a| l \wi\ > 


(|a|+<5) fc 2 2 r ' 1 oAq 


Ifc-i 


— Aq/2 


i= 0 
k—2 

yy \a\~ l \wi\ > A 0 /2( ( 
i =0 

OO 


— PxoAo 

E[T,r=o\a\->i\} 


a| + 5.u_o 2 R 'a 


- 1 


< 




|a| 


(42) 


(43) 

(44) 
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< M(A 0 )r~ k 


(45) 


with 


M(A 0 ) = 


KE[YT= i |a|->i|] 


(^) 2 Ao(^-l) 


2«'c 


< OO, 


M 


|a| 


for some K < oo and r E (1, (|a| + 5)/|a|) so that limA 0 ->oo M(Aq) = 0. Here, (42) follows from an 


inductive argument, (43) follows from the fact that the term 


,_R'-i/l a l + 5 k_ 2 a 




\k-z_ _ a 

1 'a\ 2 


is positive for k > 2 provided that 2 R> > —, (44) follows from Markov’s inequality and (45) from 
the fact that Wi is Gaussian together with the property |rc| < TV(1 + |rc| 2 ) leading to the finiteness of 

We now invoke [68, Theorem 2.1]: Let X be an X-valued Markov chain (where X is a standard Borel 
space) and T z , z > 0 be a sequence of stopping times measurable on the filtration generated by the state 
process with 7o = 0. 


Theorem 5.2 Theorem 2.1] Suppose that X is a ip-irreducible and aperiodic Markov chain. Suppose 
moreover that there are functions V: X — > (0,oo), <5: X — > [l,oo), /: X —> [l,oo) ; a small set C on which 
V is bounded, and a constant b E M, such that the following hold: 


E[V(x Tz+ i) | Et z \ < V(x Tz ) - S(x Tz ) + frl{x r ,ec} 

71+1 — 1 


E 


fi x k) | ETz <S{x T z ), 

k=T z 


z> 0. 


(46) 


Then the following hold: 

(i) X is positive Harris recurrent, with unique invariant distribution n 

(ii) 7r(/) := / f(x) ir(dx) < oo 

(iii) For any function g that is bounded by f, in the sense that sup s \g(x)\/f{x) < oo, we have conver¬ 
gence in the mean, and the Law of Large Numbers holds: 


lim E x [g(x t )\ = vr (g) 

OO 

J im E g(xt) = vr (g) a.s ., x E X 
iv->-oo TV 

t =0 


By taking f(x) = 1 for all x E X, the following holds. 

Theorem 5.3 WSLj Suppose that X is a -irreducible Markov chain with natural filtration Ft ■ Suppose 
moreover that there is a function V : X —> (0, oo), a small set C on which V is bounded, and a constant 
b E M, such that the following hold: 

E[V(xr z+ 1 ) | F Tz \ < V(x Tz ) - 1 + b\ {xTz&C } 

supE[T z+ i - T z | Fr z ] < oo. ( 47 ) 

2>0 

Then X is positive Harris recurrent. 
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Now, with the candidate Lyapunov function Vo(xt,A t ) = log(Af), for Aj- z > L, 


E i v o{xT z+ i, A Tz+1 ) | x Tz , A Tz \ = P(T z+ i -T z = l)^21og(a) + log(Afj^ 

OO 

+ lo g( A f,+fc) p (^+i ~T z = k | x Tz ,A T z ) 

k=2 

= P(T Z+ 1 -% = 1) (2 log(a) + log(A^)) 

OO 

+ ^ 2(log 2 (a) + (k - l)(|a| +5)M(A)r~ k 

k =2 


Now, by (45), liniA 0 ->.oo P{7z+ 1 — T z = l|Ao,xo) = 1 uniformly in |xo| < 2 R _1 Ao- As a result, the drift 
condition of Theorem 5.3 holds. We need to ensure, however, the small/petite set [37] property of compact 
sets to establish positive Harris recurrence. A sufficiently small compact set for this chain is petite due to 
the countability of the values that A< takes and the uniform countable additivity property of the Markov 
chain due to the presence of the additive Gaussian noise, as in p.206 of [67] and the continuity of / in 
x. This argument applies for A-dimensional systems as well with N > 2. This completes the proof of 
Theorem O 


Remark 8 The approach adopted in the proof of Theorem 5.1 applies for more general channels (such as 
erasure channels or discrete memoryless channels) subject to more tedious error bounds. o 


6 Discussion and conclusion 

In this paper, conditions on information channels leading to the stochastic stability of non-linear systems 
controlled over noisy channels has been investigated. Stochastic stability notions considered were asymp¬ 
totic mean stationarity, ergodicity and stationarity. Results for linear systems are recovered as a special 
case. 

In the following we present some future directions and a comparison with the results involving topo¬ 
logical entropy. 

6.1 Comparison with invariance entropy and deterministic non-linear systems con¬ 
trolled over noiseless channels 

As noted earlier, noise-free systems and noiseless discrete channels have been studied in the literature in 
the context of topological entropy and invariance entropy. Here, we establish some connections. 

One related result in this literature is with regard to stabilization to a point: Under the assumptions 
that (i) / has the form in ([7]) (without noise) with continuous partial derivatives, (ii) there exists a fixed 
point (equilibrium) x* so that x* = f(x*,u*), (iii) a local strong invariability condition is satisfied which 
relates the size of an invariant set and the size of a control action set in the sense that for any e > 0, there 
exist p > 0 so that for all e' £ (0, p ], the set {x : \x — x*\ < e'} is strongly invariant with the control action 
set U = {u : \u — u*\ < e}, and (iv) the pair (A, B ) is controllable where A, B are the Jacobians of / with 
respect to state and control at x*,u*, m has reported that for convergence to the equilibrium an average 
rate R > ^)| A .|>i l°g 2 (|Ai|) is sufficient, where Aj are the eigenvalues of the Jacobian at the equilibrium 
point. 

A further related result in spirit to our paper is on a case where there exists an invariant set with a 
non-empy interior: For continuous-time systems of the form ^ = f(x,u),u £ U, Colonius and Kawan [6j 
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establish a lower bound on invariance entropy as 


n • l 

max| 0, mm > —— (x,u 

x,u£QxU^OXi 


(48) 


where Q is a weakly invariant set and f t is the ith coordinate function of /. More rehned bounds are 
present if further structural properties are imposed: in |53j . under a uniform hyperbolicity assumption (see 
[53j Definition 4.4]), Theorem 4.8 states a similar lower bound by considering the unstable components in 
an invariant set. 


These results can be viewed to be related to Theorem 3.2 and Theorem 4.2, as well as Theorem 5.1 


in that the average entropy growth as measured by the eigenvalues of the Jacobian matrix under the 
invariant probability measure is lower bounded by a minimum over the elements in the support set, or is 
upper bounded by a maximizing element in the support set. In the stabilization to the point example of 
E 3 , the invariant measure is a delta measure on a single point. In the invariant set example leading to 


(48), the set Q can be viewed to be the support set of some invariant measure under the system dynamics 

Likewise, 


if such a measure were to exist. Likewise, }29] and [28j have obtained conditions for noise-free systems 
controlled over noiseless channels. Due to the absence of noise, one could identify an invariant compact set, 
and consider a bound on the Lipschitz growth parameter for the system over this invariant set to obtain 
sufficiency conditions. When the system is (Lebesgue) irreducible, however, due to the effect of noise, 
local properties are not descriptive and the invariant probability measure reflects the rate conditions and 
entropy growth in the system. In this case, the local growth integrated under an invariant measure gives 
a proper bound. 

Differential entropy is a useful measure for how much a stochastic system generates uncertainty, however 
our analysis does not distinguish between the stable and unstable modes of a controlled system and is only 
able to resemble the classical results in ergodic theory (Pesin’s formula [6'2] ) for expanding systems, and 
thus, with only positive Lyapunov exponents. In the linear case, the arguments follow by restricting the 
state space to those corresponding to the unstable modes. For a general non-linear system, however, 
a careful geometric study needs to be done. On the other hand, for deterministic systems, under a 
topological entropy formulation, the rate of growth can be measured by local Jacobian matrices, but such 
a topological discussion requires further geometric analysis with regard to the use of appropriate metrics, 
as studied extensively in [24]. Thus, the connection between the differential entropy method and geometric 
approaches requires some further study. 

We note also that recently a metric entropy generalization of some of the results in {2i] have been 
developed [5]. 


6.2 Some open directions on stationary coding and control policies and information 
theory 

It would be interesting to show, for a class of systems, that stationary coding and control policies can 
be used to arrive at stability with a stationary closed loop-process provided that the capacity of the 
channel satisfies the entropy growth bound and the channel satisfies certain ergodicity conditions. However, 
except for linear Gaussian systems controlled over Gaussian channels and erasure channels (see m for a 
detailed discussion for both setups), this question has not been answered even for linear systems controlled 
over general discrete memoryless channels (that is, non-stationary coding schemes have been used for 
more general discrete memoryless channels). Furthermore, the tightness of the converse results is another 
direction. A further direction is the causal coding problem for non-ergodic sources: In the information 
theory literature, through non-causal codes, a class of source coding theorems for non-ergodic sources exist 
(see e.g. HZ!), however, the extensions of these for even control-free non-linear systems under causal coding 
require further research. 
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A Stationary, ergodic, and asymptotically mean stationary processes 

In this subsection, we review ergodic theory, in the context of information theory (that is with the trans¬ 
formations being specific to the shift operation). A comprehensive discussion is available in Shields |50j . 
Gray m, [19] , and Appendix C in [67] , 

Let X be a complete, separable, metric space. Let B(K) denote the Borel sigma-field of subsets of X. 
Let X = X°° denote the sequence space of all one-sided or two-sided infinite sequences drawn from X. 
Thus, for a two-sided sequence space if x G X then x = {. .., x-i,xo, xi ,... } with Xi G X. Let X n : X —>■ X 
denote the coordinate function such that X n (x) = x n . Let T denote the shift operation on X, that is 
X n (Tx) = x n + 1 . That is, for a one-sided sequence space T(x o, xi,X 2 , ■ ■ ■ ) = (xi,X 2 , X 3 , ... ). 

Let jEJ(X) denote the smallest sigma-field containing all cylinder sets of the form {x : Xi € Bi, m. < i < n} 
where Bi G S(X), for all integers m, n. Observe that n n >oT _ "£?(X) is the tail er-field n n >oa(x n , x n+ \, ■ ■ ■), 
since T~ n (A) = {x : T n x G A}. 

Let fj, be a stationary measure on (X, B(E)) in the sense that p,(T~ l B) = fJ,(B) for all B G H(X). Then, 
the sequence of random variables {x n } dehned on the probability space (X, H(X), /r) is a stationary process. 

Definition A.l Let p, be the measure on a process. This random process is ergodic if A = T~ 1 A implies 
that n(A) G {0,1}. 

That is, the events that are unchanged with a shift operation are trivial events. 

Mixing is a sufficient condition for ergodicity. Thus, a source is ergodic if lim n _ > . 0O P(A n T~ n B ) = 
P(A)P(B), since the process forgets its initial condition. For the special case of Markov sources, we have 
the following: A positive Harris recurrent Markov chain is ergodic, since such a process is mixing and 
stationary. 

Definition A.2 A process on a probability space (Q, J~, P) with process measure P, is asymptotically mean 
stationary (AMS) if there exists a probability measure P such that 

N -1 

k =0 

for all events F G H(X). Here P is called the stationary mean of P, and is a stationary measure. 

Note that P is stationary since, by definition P(F) = P(T~ 1 F). For the importance of the AMS property, 
its relations with Birkhoff’s ergodic theorem, some applications and sufficient conditions, please see m or 

m- 


B 


Proof of Theorem 


3.1 


Define the event for K > 0 so that P(|a;o| < K) > 0 as 

= {cj : |x 0 | <K,w = r),i.e.,w k = r) k ,rj k G M p , k > 0}, 
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such that the noise realizations are fixed and deterministic. In the following, we will drop the subscript 
and superscripts and let Ps or P(-|<S) denote the conditional probabilities given the event .We recall 
here that {wt, t > 0} and xo are assumed to be independent. By Definition |1.1[ first note that the capacity 
expression satisfies 


C 


lim max 

T-s>oo {P( 9 i| 9 [ 0 ,t-i]. 9 [ 0 ,t-i])> 


lim max 

T-s-oo {P(?t| 9 [o.t-i]. 9 [o, t -i])> 




(49) 


where the conditional directed information is given by 


T—l 


I(Q[0,T-1] <7[0,T—1]I^0 — Y/ ^(^[0,t]i <5) + Hqo'i Qo\S)- 


t =1 


Here, (49) is a result of the following: Consider an encoder policy given by 

P* = {P*(q 0 ),P*(qi\q 0 , q' 0 ), ■ ■ ■ , P*(q t \q [0 ,t-i], «J 0|t -i]),' ■ ■ }■ 
For any t £ N, almost surely the following holds: 


P(Qt\Q{o,t-i\iS) 

«[o,t] 

= Y ^ > (9tk[o,t])9(o,t-i])‘5)- F> (9[o,t]k(o,t-i])'5) 

9[0 A 

= ^ ■ p (9tk[o,t])9(o,t-i])- p (9[o,t]k(o,t_i])'5) (50) 

9[0,t] 

= Y P(Qt\Q[o,t] 1 Q[o,t- 1 ] )P* (qt | q[o,t- 1 ]) Q[o,t- 1 ]! «5)- p (9[o,t-i] 1> $) 

9[0,t] 

= Y p (q't\q[o,t]iq[o,t-i]) p *(qt\q[ 0 )t -i\,q[o,t-i])P(q[o,t-i]\q{o : t-i}^) ( 51 ) 

9 [o A 

= Y p (^k[o,t],gJo,t-i]) p *(%k(o,t-i]^[o,t-l]) p (9[o,t-i]k|o )t _i]) (52) 

3[0,t] 

= p (^k|o,i_i]) (53) 


where (50) follows from Definition |l.l[ ( |51[ ) from the structure of a coding policy, and ( |52[ ) from the following 
inductive argument. Note that P(q 0 ,q’ 0 \S ) = P(q 0 ,(Q- If p (q[o,t-i], q' [0}t _i]\S) = P(<?[o,i-L], <?[ 0 ,t-i])’ [t 
follows that 


p (q[o,t]^q[o,t}\ S ) 

= P(qt\q[o,t], q[o,t-i]i s)p* (qtWfij-i]’ q[o,t-i\, s)P(q[o,t-i\, q[o,t-i]\s) 

= P(q't\q[o,t]i q'[o,t-i])P* (qt\q[ 0 j-i}i q[o,t-i])P(q[o,t-i]i q{o,t-i]\s) 

= P(qt\q[o,t]> q[o,t-i]) p * (qt\q[o,t-i\i Q[o,t-i])P(q[o,t-i], q[o,t-i\) 

= P(q[o,t],q{ 0 ,t]) ( 54 ) 


22 




As a result, (52) simplifies to (53) by eliminating the conditioning on S and (49) holds. 


We now use a similar argument as in ([20 ), but need to modify the steps due to the conditioning on S: 

i ' T_1 

Rt > lim sup — ( 

T—>oo 


i ( T ~ x \ 

lim Rt > lim sup — (E i{xfiq't\q{o,t-iY s ) + i(xo]q'o\s)J 


lim sup — E 


T—l 


T—too 


1= 1 
T—l 


lim sup — E 


T—» oo 


1=1 

T—l 


lim sup — E M/(®l~l) + But -1 + ™l-lk[ 0 ,l-l]) - 


T—too 


1= 1 
T—l 


lim sup — E i) + Sut-il^t-i]) - 


T—»oo 


1=1 

T—l 


lim sup — E (/(*t-l) Ifffo.t-i]) - h s(x t \q [0} t]) 


T—too 


1=1 

T—l 


T—too 


1=1 


lim sup ( E p 5(^i-ik[ 0 ,i_i])log2(l^(/(^i-i))l) 


+^5(®t-ik[ 0 ,i_ 1] ) - M®i|gj 0 ,t]) 
> Vs ~ iim^mf ^/»5(*T-i|gj 0> r_i]) 


Here, 


Vs = lim inf E 

T—>-oo 


T—l 

P ‘S(^l-lk[o i t_l])log 2 (|«/(/(Tl-l))|) 


1=1 


(55) 

(56) 

(57) 


and (55) follows from the fact that 

M/(®t-i)k[o,t-l]) = E J P -s(^i-ik[o,i-i])log 2 (|J(/(x t _i))|) +h s (x t -i\q[ 0tt _ 1] ), 

where the expectation is over the realizations of qL t _iy Finally, we use the boundedness of h(x o) (and 
thus h(xo\q' Q )) in (56). Thus, with Vs > L, it follows that 


imfmf ^M®T-ik[ 0 ,T_i])) > L — C 

We now seek to obtain an upper bound on hs(xT-i\q^ 0T _^)- As in [33], note that 

hs(x T \q { 0jT ]) < hs(x T ,y\q [0tT] ), 

where y is a binary random variable which is 1 if \xt\ < b(T) and 0 otherwise. Let 

P S (y = l) = Ps(\x T \<b(T))=:pS. 


( 58 ) 
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Then. 


hs(x Ti y\q [0yT] ) = h s {x T \q [0)T] ,y) + H s (y\q[ 0T] ) 
< h s {x T \q[ 0tT] ,y) + 1, 

since y is binary. We have that 


and 


hs 0 T |g[ 0 ,T] ,y)<VT7 l !og 2 (2vr eb 2 (T)) 


+(1 — Px)hs ( x T 


q{ 0)T ]A x T\ > b ( T ) 




XT 


9[ 0 ,T]>kr| > 6(T) 


= hs yf(xT- 1 ) + But- i + w’T-i) 
< ( Hxt-i) + But -i + u’t-i) 


q[o,T]i \ X T I > &( T ) 


9[0,r-i]>l a! T| > 


= ( f(xT- 1 ) + WT-l) 


Q[0,t-i]A x t\ > b(T)) 

hs(j(x T - 1 ) 9[ 0 ,t-i]’ I x t| > K T )^) 

/ Hs ^^T-i g( 0)T _i], |®t| > log 2 


= E 

+hs ( xt-1 
< M + hs\ xt— i 


J(/(x T -l)) 


9[ 0> t-i]>I*t| > &(r)^ 
<7[o,t— i], M > 


(59) 


(60) 


(61) 


< MT + hs[ x o 


|x T | > b(T) 


(62) 


Here (59) follows from that conditioning on a random variable reduces the differential entropy, and (60) 


follows due to the fact that S determines the noise realizations. We note that non-linearity of / add further 
technical issues when compared with the linear setup0 Here, M is the supremum of log 2 (| J(/(x))|). In 
the above derivation in (61), we use the fact that / is invertible. In the last inequality, we use the fact that 


the entropy of a random variable with a fixed covariance is upper bounded by the entropy of a Gaussian 
with the same covariance, and that |xo| conditioned on S is upper bounded by K 2 . 

Thus, by ( 58p9 ) and (62) we have 


lim inf T { 1 + (1 - pf) ( MT + h s (x 0 

T—»oo 1 


|x T | > b{T )) ) +pf^log 2 (27re6 2 (T)) 


'Two technical intricacies here are as follows: For differential entropy (unlike discrete entropy) the relationship h(x + y) < 
h(x) + h(y) does not in general hold for random variables x, y; this is why first a conditioning on 5 is taken in the proof. 
Furthermore, we cannot obtain an upper bound by taking out the conditioning on the event |xt| > b(T), since conditioning 
on a single event may decrease or increase entropy; note that conditioning on a random variable, however, does not increase 
the entropy. 
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>L-C , 


( 63 ) 


Since hs ( £o 


xt\ > b(T) ] < (n/2) log 2 (27reA' 2 ), it follows that for all K and rj 


lim sup P s k{\xt\ < b(T)) < 

T—»oo v 

for all 6(T) such that lim'r_>. 00 log 2 (6(T))/T = 0. But now 


M — (L — C) 
M : 


lim supP(|xr| < b(T)) 

T—>• oo 

< lim supP(|xr| < b(T), |xo| < K) + lim supP(|:r;r| < b(T), |xo| > K) 

oo T—>-oo 

< lim sup P (| xt | < b(T), |xo| < K) + P(\x 0 \ > K ) 

T—> oo 

= lim sup / P(d77)Psx(|x T | < 6(T)) + P(|cco| > K) 

T—>oo J v 

< [ P(dri) lim sup P 5 jf (|xt| < fe(T)) + P(|xo| > A") 

.7 T-s-oo 77 


(64) 


< J p ( dr l) 


M — (L — C ) 


M 


+ P(M > K) 


M — (L — C) 
M 


+ P(|x 0 | > A) 


where we use Fatou’s lemma in (64) and the fact that (63) holds for every restriction of the noise realizations 


T] and K values. Since an individual probability measure is tight, limx-^oo P(|a?o| > K) = 0, the right hand 
side can be made arbitrarily close to and the result follows. o 
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